The idea is to create a ‘custom’ distance matrix, to induce the grouppings.
\(c_1\) is within the same species - but no unassigned
\(c_2\) is between assigned species
\(c_3\) is between species and unassigned
\(c_4\) is within unassigned and unassigned
For this document we will use only the following configuration:
\(c_1=1\), \(c_2=1000\), \(c_3=10\) , \(c_4=10\)
A priori, the ‘probability’ of unassigned ASVs to agglomerate between themselves is the same as if it was to agglomerate with other species. Large \(c_2\) makes it harder for known species ASVs to group between themselves.
Run 0 stress 0.06468084
Run 1 stress 0.07480603
Run 2 stress 0.0730409
Run 3 stress 0.07345601
Run 4 stress 0.07522215
Run 5 stress 0.07467133
Run 6 stress 0.07474017
Run 7 stress 0.07560192
Run 8 stress 0.07449577
Run 9 stress 0.07404174
Run 10 stress 0.07397977
Run 11 stress 0.07262368
Run 12 stress 0.07255114
Run 13 stress 0.07532366
Run 14 stress 0.07304862
Run 15 stress 0.07680197
Run 16 stress 0.07180818
Run 17 stress 0.07454453
Run 18 stress 0.07792136
Run 19 stress 0.07446214
Run 20 stress 0.07575259
*** Best solution was not repeated -- monoMDS stopping criteria:
20: no. of iterations >= maxit
Run 0 stress 0.08168081
Run 1 stress 0.08429471
Run 2 stress 0.08720077
Run 3 stress 0.08838941
Run 4 stress 0.08728998
Run 5 stress 0.08835142
Run 6 stress 0.08889119
Run 7 stress 0.08553928
Run 8 stress 0.08729163
Run 9 stress 0.08936481
Run 10 stress 0.0900273
Run 11 stress 0.08680416
Run 12 stress 0.0872725
Run 13 stress 0.0857184
Run 14 stress 0.083441
Run 15 stress 0.0866785
Run 16 stress 0.08572318
Run 17 stress 0.08578093
Run 18 stress 0.08780596
Run 19 stress 0.08687016
Run 20 stress 0.085108
*** Best solution was not repeated -- monoMDS stopping criteria:
20: no. of iterations >= maxit
Run 0 stress 0.08718612
Run 1 stress 0.09025029
Run 2 stress 0.08938301
Run 3 stress 0.09251931
Run 4 stress 0.09070787
Run 5 stress 0.09289893
Run 6 stress 0.09116608
Run 7 stress 0.0906269
Run 8 stress 0.08901612
Run 9 stress 0.09330492
Run 10 stress 0.09101909
Run 11 stress 0.094913
Run 12 stress 0.08973203
Run 13 stress 0.09116532
Run 14 stress 0.09227395
Run 15 stress 0.09204363
Run 16 stress 0.09165805
Run 17 stress 0.09108332
Run 18 stress 0.09168284
Run 19 stress 0.09580641
Run 20 stress 0.09207231
*** Best solution was not repeated -- monoMDS stopping criteria:
20: no. of iterations >= maxit
Run 0 stress 0.04026351
Run 1 stress 0.04973186
Run 2 stress 0.05058196
Run 3 stress 0.05179752
Run 4 stress 0.04693137
Run 5 stress 0.04758069
Run 6 stress 0.0511412
Run 7 stress 0.04596287
Run 8 stress 0.05277352
Run 9 stress 0.04661559
Run 10 stress 0.04677193
Run 11 stress 0.05037888
Run 12 stress 0.04614352
Run 13 stress 0.05694953
Run 14 stress 0.05074769
Run 15 stress 0.04656415
Run 16 stress 0.04582944
Run 17 stress 0.05573858
Run 18 stress 0.05206533
Run 19 stress 0.05362203
Run 20 stress 0.05641374
*** Best solution was not repeated -- monoMDS stopping criteria:
20: no. of iterations >= maxit
Run 0 stress 0.05200548
Run 1 stress 0.05383954
Run 2 stress 0.05610581
Run 3 stress 0.05604351
Run 4 stress 0.05521967
Run 5 stress 0.0580235
Run 6 stress 0.05492497
Run 7 stress 0.05614902
Run 8 stress 0.05483286
Run 9 stress 0.05437525
Run 10 stress 0.05562016
Run 11 stress 0.05585861
Run 12 stress 0.05535613
Run 13 stress 0.05544325
Run 14 stress 0.05372001
Run 15 stress 0.05466198
Run 16 stress 0.05531106
Run 17 stress 0.05431961
Run 18 stress 0.0560297
Run 19 stress 0.05508792
Run 20 stress 0.05795377
*** Best solution was not repeated -- monoMDS stopping criteria:
20: no. of iterations >= maxit
Run 0 stress 0.05611373
Run 1 stress 0.05740888
Run 2 stress 0.05707433
Run 3 stress 0.05944355
Run 4 stress 0.05722508
Run 5 stress 0.05829266
Run 6 stress 0.05845288
Run 7 stress 0.05789314
Run 8 stress 0.06205236
Run 9 stress 0.05767133
Run 10 stress 0.06056325
Run 11 stress 0.06041551
Run 12 stress 0.05831889
Run 13 stress 0.05920586
Run 14 stress 0.05838745
Run 15 stress 0.05855115
Run 16 stress 0.05905985
Run 17 stress 0.05906832
Run 18 stress 0.05790463
Run 19 stress 0.05859275
Run 20 stress 0.06104219
*** Best solution was not repeated -- monoMDS stopping criteria:
20: no. of iterations >= maxit
598 ASVs and 971 samples.
Can we create a mixture of those tow distances and se if a pattern emerge?
Run 0 stress 0.04026351
Run 1 stress 0.04973186
Run 2 stress 0.05058196
Run 3 stress 0.05179752
Run 4 stress 0.04693137
Run 5 stress 0.04758069
Run 6 stress 0.0511412
Run 7 stress 0.04596287
Run 8 stress 0.05277352
Run 9 stress 0.04661559
Run 10 stress 0.04677193
Run 11 stress 0.05037888
Run 12 stress 0.04614352
Run 13 stress 0.05694953
Run 14 stress 0.05074769
Run 15 stress 0.04656415
Run 16 stress 0.04582944
Run 17 stress 0.05573858
Run 18 stress 0.05206533
Run 19 stress 0.05362203
Run 20 stress 0.05641374
*** Best solution was not repeated -- monoMDS stopping criteria:
20: no. of iterations >= maxit
Haven’t compared it yet, but could be and Idea…
Here the idea is to aggregate some samples. So we would have compositions of ASVs but instead of samples we would have a less sparse and with less dimensions. The most natural could be:
Compositions of ASVs, under longhurst provinces
Compositions of ASVs, under longhurst provinces combined with different depths
Cruises?
Chunks of latitudes, and chunks of depths.
So for each aggregation we should take a look at mds, tsne, umap of the ait distances (or distance matrix using the CLR transformation)
So first lets do a bit of EDA to see what we can expect.
Run 0 stress 0.08699511
Run 1 stress 0.09028113
Run 2 stress 0.09059218
Run 3 stress 0.08996065
Run 4 stress 0.09032197
Run 5 stress 0.090564
Run 6 stress 0.09053852
Run 7 stress 0.09461317
Run 8 stress 0.09119436
Run 9 stress 0.09347122
Run 10 stress 0.09054701
Run 11 stress 0.09077451
Run 12 stress 0.09002654
Run 13 stress 0.08790095
Run 14 stress 0.09047252
Run 15 stress 0.08853844
Run 16 stress 0.08927363
Run 17 stress 0.09012554
Run 18 stress 0.09072616
Run 19 stress 0.09376992
Run 20 stress 0.09012315
*** Best solution was not repeated -- monoMDS stopping criteria:
20: no. of iterations >= maxit
Run 0 stress 0.08528086
Run 1 stress 0.09062615
Run 2 stress 0.09137155
Run 3 stress 0.08906065
Run 4 stress 0.08833639
Run 5 stress 0.08865741
Run 6 stress 0.08944729
Run 7 stress 0.08944238
Run 8 stress 0.0890103
Run 9 stress 0.09019254
Run 10 stress 0.09101819
Run 11 stress 0.08887575
Run 12 stress 0.08926578
Run 13 stress 0.08878498
Run 14 stress 0.09129564
Run 15 stress 0.08951457
Run 16 stress 0.08989411
Run 17 stress 0.08921273
Run 18 stress 0.0885944
Run 19 stress 0.08986336
Run 20 stress 0.08904013
*** Best solution was not repeated -- monoMDS stopping criteria:
20: no. of iterations >= maxit
#Creating Clusters
Let’s now create and evaluate the clusters using this two ocean slices that we have